1 

Optimal Projection Method in Sphere Decoding 

Arash Ghasemmehdi and Erik Agrell 



ON 
O 

o 

(N 
=3 



O 



> 
&\ 

(N 
O 

O 

On 
O 



X 



Abstract — An entirely different approach to complexity reduc- 
tion in sphere decoders is taken. Here we demonstrate that most 
of the calculations in the standard algorithms are in fact redun- 
dant in the sense that the calculated values are never used. This 
applies to all recursive sphere decoder algorithms, including the 
numerous variations of the Fincke-Pohst and Schnorr-Euchner 
strategies. We propose a method, which is applicable to lattices as 
well as finite constellations, to avoid these redundant calculations, 
thus reducing the complexity. We emphasize that the algorithms 
otherwise perform exactly as before, visiting the same points 
in the same order, and returning the same result. Pseudocode 
is given to facilitate immediate implementation. In simulation 
results, it is shown that the relative complexity gain with the 
proposed add-on goes up linearly as the dimension of the lattice 
increases. For instance, the complexity is reduced to one fourth 
for lattices at dimension sixty. 

Index Terms — Closest point search, Fincke-Pohst, lattice, 
Lenstra-Lenstra-Lovasz (LLL) reduction, maximum likeli- 
hood (ML) detection, multiple-input multiple-output (MIMO), 
Schnorr-Euchner, sphere decoder, Voronoi region. 



I. Introduction 

EVERY lattice is represented with its generator matrix G, 
whose entries are real numbers. Let n and m denote the 
number of rows and columns of G respectively with n < m. 
The rows of G, which are b\, . . . ,b n , are called basis vectors 
and are assumed to be linearly independent vectors in K m . 
The lattice of dimension n is defined as the set of points 



A(G,Z) = {uibi 



u„b n \ui eZ}. 



(l) 



Every point is surrounded by a region, which is known as 
the Voronoi region. The Voronoi region of a lattice point is the 
set of all vectors in R m that has a shorter Euclidean distance to 
this lattice point than to any other point in A(G, Z). Hence, the 
set of all Voronoi regions tile the space M. m without overlap, 
disregarding the boundaries. 

Finding the closest lattice point to a given vector r £ M. m 
is equivalent to finding the Voronoi region that this vector 
belongs to. It requires minimization of the metric \\r — uG\\ 
over all lattice points uG. However, complete enumeration of 
the points is not feasible. The main idea of closest point search 
algorithms is to minimize the metric over all lattice points 
located inside a hypersphere centered on r, and reduce the 
number of points that has to be enumerated. For lattices with 
known structural properties, closest point search algorithms 
could be modified to avoid some unnecessary numerical op- 
erations [1], [2]. However, in our case where lattices without 
any specific structures are addressed we have to implement a 
brute force search to find the nearest lattice point [3], [4]. 

In 1981 Pohst [5] suggested a way of finding the closest 
point in lattices, which later on was complemented by Fincke 
and Pohst in [6]. The implementation details of Fincke-Pohst 
(FP) enumeration method were first presented by Viterbo and 



Biglieri in [7]. In 1999, Viterbo and Boutros applied the FP 
enumeration method to maximum likelihood (ML) detection 
for finite constellations [4]. Later on, Agrell et al. in [3] 
illustrated that the Schnorr-Euchner (SE) refinement [8] of 
the FP enumeration strategy improves the complexity of the 
sphere decoder algorithm. It combines the advantages of the 
Babai nearest plane algorithm [1] and the FP strategy. 

A given lattice can be represented by many different sets 
of basis vectors. The efficiency of the closest point search 
algorithms can be enhanced if the basis vectors of the lattice 
are reduced. Reduction in lattices is a way of making the basis 
vectors as short as possible and fairly orthogonal to each other, 
without changing the structure of the lattice, in order to shorten 
the overall search time. In 1982 Lenstra, Lenstra, and Lovasz 
[9] achieved a breakthrough by constructing an algorithm that 
very fast produces a reduced basis in a certain sense, a so- 
called LLL reduced basis, for any given lattice basis. The 
implementation details of LLL reduced basis is well presented 
in [9], [10]. 

During the last decade a lot of work has been done to 
improve the efficiency of the sphere decoder algorithms [11]- 
[15], due to the vast amount of demand they have in numerous 
types of applications. In communication theory, there are lots 
of applications where the closest point problem arises. ML 
detection in multiple-input multiple-output (MIMO) channels 
[11], [16]— [19], quantization [20], vector perturbation in mul- 
tiuser communications [21], joint detection in direct-sequence 
multiple access system [22], multiple symbol differential 
detection [23], and Max-Log-Map detection [24] are some 
examples of them. 

The closest point search algorithms can be modified to find 
the ML point in finite constellations [4], [11], which has an 
important application in MIMO channels. Assuming a system 
with n transmit and m receive antennas, the new set of points 
A(G,U) is defined by replacing Z in ((TJ with the finite range 
of integers 



^ \Umini U m i n -\- 1, . . . , Umax}- 



(2) 



The transmit set can be mapped to an i-PAM constellation 
with L = U max — U m i n + 1. The received vector after an 
additive white Gaussian noise (AWGN) channel with double- 
sided noise power spectral density No/ 2 is 

r = uG + n, (3) 

where u £ U'\ r £ M m , G £ K" xm , and n £ M m is a vector of 
independent and identically distributed (i.i.d) Gaussian noise 
with variance No/2. In this case ML detection is equivalent 
to minimization of the metric \\r — uG\\ over all possible 
points uG with u £ U n . In MIMO systems where usually 
quadrature amplitude modulation (QAM) is used, the L 2 - 
QAM signal constellation can be viewed as two real-valued 
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L-PAM constellations with u e U , r € 



p2m 



, G e 



p2n x 2m 



and « e M 2m . 

In this paper we take a completely new approach to reduce 
the complexity of the sphere decoder algorithms, thus drawing 
attention to a hitherto unnoticed problem with the standard 
algorithms. It is illustrated that the proposed sphere decoder 
algorithms based on FP [6] and SE [8] enumeration strategies 
are not well optimized in the sense they perform many 
excessive numerical operations and calculate variables which 
are not supposed to be used. Thus, a method is proposed to 
prevent these unnecessary computations. However, the revision 
proposed is not related to choosing a more accurate upper 
bound, or scanning the layers optimally. We believe that the 
SE strategy is the best way in this regard. Our modification 
instead changes how the received vector r is projected onto 
the basis vectors, which accounts for most of the floating point 
calculations in the sphere decoder. With the method proposed, 
not a single value would be calculated twice or remain without 
any use. This is what we mean with an optimal projection 
method. 

Based on the proposed modification an algorithm inspired 
by the SE enumeration strategy is established. A standalone 
representation of the algorithm is included in Sec. |IV] The 
section also contains the modification needed to implement 
the renewed algorithm for finite constellations. 

II. Closest Point Search Algorithms 

A thorough conceptual description of the sphere decoder 
algorithms is presented here. Even though what is explained 
herein has the same basic principles as the previous enumer- 
ation methods explained in [3]-[8], for better understanding 
and comprehension of the main contribution of the paper, it 
seems indispensable. 

Using the so-called QR decomposition, any real-valued 
n x m matrix G with n < m can be factorized as G = RQ, 
where R is an n x n lower-triangular matrix and QQ T = I. 
Using the matrix R, which can be seen as a rotated and 
reflected version of G, is much more convenient than using 
lattice generator matrix G itself. The obtained lattice point 
can then easily be shifted to its original place through rotation 
and reflection [3]. Hence, without loss of generality, we 
assume that G is a square lower-triangular matrix. For better 
explanation and visualization, the H = G 1 matrix instead 
of G is used. The elements of H are named according to the 
convention 



H 



Hi, 







2.1 



H. 



2.2 







(4) 



where H is an n x n lower-triangular matrix with positive 
diagonal elements. 

Every lattice can be divided into layers of lower-dimensional 
lattices. The diagonal elements of H illustrate the distances 
between these layers, such that represents the distance 

between the (i — l)-dimensional layers in an i-dimensional 



layer. Thus \jH\,\ is the distance between the lattice points 
in a one-dimensional layer. 

We assume that we have a received vector r in an n- 
dimensional Euclidean space R" and that an upper bound 
C on \\r — uG\\ 2 is somehow known. We intend to find the 
closest lattice point to this vector. As is obvious from the 
name sphere, the sphere decoder algorithms start by drawing 
a virtual n-dimensional hypersphere centered on r with radius 
\[C and then enumerating the lattice points located inside 
this hypersphere. By finding a new potential closest point, the 
radius of the hypersphere in which the points are enumerated 
is reduced, and the algorithm considers minimizing the metric 
over the lattice points inside the smaller hypersphere, also 
centered on r, with improved radius. However, the way that 
the first radius is calculated differs between the FP and SE 
algorithms. For the SE enumeration strategy, the radius of the 
examined hypersphere is first considered as infinity. We then 
continue by finding the first potential closest point, which is 
the Babai point [25] in our case, and defining a new upper 
bound. This speeds up the algorithm compared to the FP 
enumeration method where defining the initial upper bound 
is a critical issue. 

Fig- [TJ illustrates an n-dimensional hypersphere with radius 
y/C centered on a received vector r which contains several 
(n — 1) -dimensional layers. The basis vector b n is in the 
same direction as the hypotenuse of right triangles AABC 
and ADEC, while all the other basis vectors b\, . . . ,b n -i lie 
in the subspace spanned by one of these (n — 1) -dimensional 
layers. 

Starting from dimension n, after defining the initial upper 
bound, the received vector 



r=(r u r 2 ,...,r n ) e K" 

is projected onto the lattice basis vectors 61,62, b r 
can be easily done by a simple matrix multiplication 



>G 



rH, 



(5) 
This 

(6) 



where e n — (E n i, E nt 2, . . . , -E n ,n) S Observe in par- 
ticular that E n>n = r n H ntn because of the lower-triangular 
form ©. By knowing C and E n>n according to [4] the 
corresponding range for the integer component u n , which is 
also intuitively conspicuous from Fig. [T] is 



\-H„, n VC + £■„,„] < U n < [H n!n VC + E n , n \, 



(7) 



where [ ] and |_ J denote the round up and round down 
operations respectively. Another difference between the FP and 
SE enumeration methods shows up here. While the FP exam- 
ines all the layers between the intervals above sequentially, 
the SE refinement is to follow the layers in a zigzag path. 
In other words, the SE algorithm first examines the nearest 
(n— 1) -dimensional layer and then goes for the second nearest 
(n — l)-dimensional layer which is on the opposite side of r, 
viewed from the nearest layer. This is well known as the main 
contribution of the SE refinement to the Pohst enumeration 
strategy, and it is the reason why the first lattice point visited 
by SE algorithm is the same regardless of C, whereas with 
the FP algorithm, the first point typically lies close to the 
boundary. 
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Fig. 1, Snapshot of an n-dimensional hypersphere, divided into a stack of 
(n — 1) -dimensional hyperspheres (layers). 



After finding the potential (n — 1) -dimensional layer that 
is to be examined, the next step is to calculate the orthogonal 
displacement y n from the received vector r to this layer, which 
is shown with line DE in Fig. Q] This can be performed by 
the similarity theorem in geometry: 



AABC ~ ADEC 



E., 



Un 



(8) 



In order to calculate the £J n _x, n _i value that will be used later 
on to calculate the w n _i range (Y2\ and y„-i displacement 
(I161 l. the received vector r is first projected to the examined 
(n — 1) dimensional layer (0 and then to the lattice basis 
vectors ( fTOb . We use the notation r n _i for the projected 
received vector r, where n — 1 denotes the dimension of the 
layer that the received vector is projected on. Thanks to the 
lower-triangular representation, the orthogonal projection of r 
onto the (n— 1) -dimensional layer currently being investigated 
affects only the last component of r. So, it would be sufficient 
to subtract y n from the nth element of r to reach 



r n -i = (n, r 2 , . . .,r„ 
where according to © and © 



Vn), 



(9) 



Vn 



Hn n 



This positions r n -i exactly on the perpendicular vertex of 
ADEC. Projecting vector r n -i to the lattice basis vectors 
can also be done by a simple matrix multiplication similar to 
© 

e n -i=r n -iH (10) 
= rH-(0 > ... ) 0,y n )H 
= e n -y n (H nt i,...,H ntn ), (11) 



where e n - X = (E n -i,i, ■ ■ ■ , £ n _i,„_i, J5 n _i, n ), and 
E n -\ t n = E n>n — y n Hn,n which is also equal to u n . Cal- 
culating = E^n-i - y n H n ,n-i, which is the value 
that should be multiplied to the lattice basis vector b n -\ to 
create the projected vector r n _i, and substituting A„ = y n the 
corresponding range for u n _i is [4] 



l,n-l 



l.n— 1 



1 < 



V B n . 

< L-Hn-l,n-lV B n -\ + Sn-l.n-lJ , (12) 



where B n _i = C — A„ is the squared radius of the examined 
(n — l)-dimensional layer. 

The sphere decoder is applied recursively to search this (n— 
1) -dimensional layer. Thereafter the next u n value in (0 is 
generated and a new (n — 1) -dimensional layer is searched. 
This search strategy can be illustrated as a depth-first search 
[26] of the tree in Fig. [2] Once we reach the last node the 
bounds are updated recursively and the search backtracks to 
the most recent node that has not finished exploring yet. 

By doing a simple generalization to compute the projection 
values in an i-dimensional layer, which are used later on in 
(fT~6b and (1201 . for i = 0,...,n-lwe derive 



r,H 



E 

j=i+i 



H. 



(13) 



(14) 



where #*j is a projected received vector r to an i-dimensional 
layer, and 



(15) 



gives the coefficients of expressed as a linear combination 
of the lattice basis vectors. Thus in a zero-dimensional layer, 
which is a lattice point, r$ e A(G, Z) and eo = r^H e Z". 
Assuming an i-dimensional sphere similar to Fig. Q] the 
orthogonal displacement between the projected vector ri and 
the examined (i — 1) -dimensional layer is 

EiA - Ui 



i = 1, . 



(16) 



Based on lower-triangular form and the interpretation that 
yi only affects the ith component of r^, for i = 1, . . . , n 

n-i = (n, . . ,,n-i,n - r n - y„), (17) 

where, according to (IT3b and ( TToT l 

( rj - yj)H jti 



— 1 — \ ^ 

3=1+1 



1. 



Similarly, the bounds for every i-dimensional layer are 

i = l, 
i = 1, 



Ai = + yf+i + 

B; = C — Aj+i , 



, n, 
,n- 



1, 



(18) 
(19) 



where Aj is the squared distance from the received vector r 
to the projected vector r^_x, and _B, is the squared radius of 
the examined i-dimensional layer. Hence, it can be interpreted 
that Ai denotes the Euclidean distance between the received 
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Ai = C 



Fig. 2. The search tree of the sphere decoder, where arrows illustrate the 
connection between the different parameters. Only parameters computed in 
an implementation are included. Only one node at each level 1 , . . . , n is 
expanded. 



vector r and a potential closest point Tq, Finally, the ranges for 
the ith integer component of the u values for i — 1, . . . , n — 1 
are [4] 



l-Hi^y/Bi + E^] < Ui < Jlijy^I!, • /•/,.. . 



(20) 



where is the value that should be multiplied to the lattice 
basis vector bi to create the projected vector r< (TOT i. 

III. Projection Of The Received Vector 

In this section we propose our own modification to the SE 
closest point search algorithm, and also give a suggestion to 
improve the FP method in [4]. Even though the SE refinement 
to the FP strategy improved the complexity of the sphere 
decoder algorithm, there is still an area for improvement. 

A. The Standard Projection Method 

Most of the numerical operations carried out in standard 
sphere decoders are related to the projection of the received 
vector r, or its lower-dimensional counterpart, onto the lattice 
basis vectors while we are continuously moving up and down 
in the hierarchy of layers and updating the ei vectors in Fig. [2] 
The Eij values of e vectors can be calculated and presented 
as 



All elements are updated from the elements immediately 
below, as shown in (fl4l > and Fig. |2] However, the only values 
that are required in the sphere decoder algorithms are the 
diagonal elements E^i, used in ( TToT l and ( f20b . Thus, the 
elements located above the diagonal of E are not required 
to be calculated. They are corresponding elements to u values 
that have already been calculated in the previous stages of 
the algorithm, see ( fl5l ). It is also obvious that e n © is just 
calculated once since there exists just a single n-dimensional 
sphere. 

The sphere decoder algorithm based on the SE enumeration 
strategy proposed in [3] always updates the first i elements of 
Cj simultaneously. For instance, if we are in an z-dimensional 
layer after computing E^, we update all Eij (J = 1, . . . , i— 1) 
values located in the zth row of E. So, at each layer we 
calculate all the projection values obtained after the projection 
of the vector to the lattice basis vectors. These values will 
then be used in order to update the Ejj (j = 1, . . . ,i — 1) 
values when we are moving down in the layers. But why 
should one project the entire vector r.; to the lattice basis 



vectors, and calculate the other Eij (j = 1, 



1) values 

















• 













when they are not supposed to be used at that stage of the 
algorithm? The answer to this question inspires an intelligent 
algorithm to keep track of projection and updating the Ei j 
values. 



B. Smart (Vector) Projection Method 

The method proposed in Sec. IIII-AI is far from ideal and 
is not, however, the most efficient way of projecting and 
updating the Eij values. The algorithm we propose to manage 
the optimum projection of the received vector r is based on 
following criteria: 

• As explained in Sec. IIII-AI we are just interested in 
elements located in the lower triangular form of E. 

• The last row of E, e n , is just calculated once ©. 

• According to ( fT~4-b and ( fl3b updating every Ejj element 
(i < 3 < n) °f E requires knowledge of both -Ej+i,i and 
Uj+i values. 

« Unlike the SE enumeration strategy in [3] which follows 
the row-wise updating method we follow the column- 
wise updating strategy plus a significant refinement to 
it that avoids starting from the rtth layer and updating 
all E n —i : i, E n —2,ii ■ ■ ■ , -Ei+i.i elements located in the zth 
column of E before calculating the objective En value. 

• If we move to an i-dimensional layer, the first i + 1 
elements of e. L and of other e vectors above that row will 
be affected since we are projecting the received vector r 
to this new i-dimensional layer. However, the elements 
below ei will remain unaffected. 

• Our main target at each stage, when we are moving 
towards the lower-dimensional layers, is just to update 
the Ei_i values. The other Ejj values for j > i will be 
updated if they are needed to calculate the E^i values, 
otherwise they will remain untouched. 
We should keep in mind every time up to which layers we 
have moved downwards and upwards (more explanation 
in Sec. HQ. 
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According to preceding principles a novel method of pro- 
jection and calculation of E^j values is proposed in the next 
section. With the method proposed above, the elements of 
E would be calculated in an optimal way. In Sec. [V] when 
the simulation results are exposed we will notice how the 
complexity of the sphere decoder algorithm, for both lattices 
and finite constellations, is reduced due to optimum projection 
of the received vector r. 

Due to the evident performance gain the sphere decoder 
based on SE enumeration strategy possesses compared to its 
other predecessors, we apply herein the proposed refinement 
just to the SE algorithm. However, the discussed optimal 
projection method can be implemented to other recursive 
enumeration strategies including the FP enumeration method. 
The Ei i (fl4l > and £?, ( fl9l ) elements defined here are the same 
Si and Ti variables explained in [4] that can be updated with 
the same strategy. 

IV. The Proposed Algorithm 

A standalone representation of the new algorithm based 
on the proposed smart vector projection method and the SE 
enumeration strategy is given. The algorithm is intended to be 
sufficiently detailed to allow a straightforward implementation, 
even without knowledge of the underlying theory. The pseu- 
docode presented in Sec. lIV-Al is written for lattices. However, 
the modification needed for finite constellations is included in 
Sec. HSU 

A. Sphere Decoder for Lattices 

The input parameters of the algorithm are received vector r 
(O of size n and the n x n inverse matrix H of the lattice 
generator matrix G. The output is an n-dimensional integer 
vector u — Ui, ■ ■ ■ , u n ) € Z™ which gives a closest lattice 
point x G A(G, Z) to the received vector r after a simple matrix 
multiplication x = uG. The function sign (a) which is used in 
the algorithm returns — 1 if a < and 1 if a > 0, and the 
round operation rounds off everything to the nearest integer 
number. Ties in the round o peration can be broken arbitrarily. 



Algorithm Decode (H,r) 



1 



1 n = Lattice Dimension 

2 C = oo 

3 k = n 

4 di = k, i = 1, . . . . 

5 A fe+ i = 

6 e k =rH 

7 u k = round(E kk ) 

8 y k = (E kk - u k )/H kk 

9 steph =sign(y k ) 

10 \ k = V 2 k 

1 1 LOOP 

12 do { 

13 if (k + 1) { 

14 k = k-l 

15 Ei_x >k — Ei_ k — yiHi fk , 

16 life = round(E kk ) 

17 y k = (E kk - u k ) I H kk 



Initialization 



Case(A) 



d k , d k 



.,jfe + l 



18 step fe = sign (y k ) 

19 Afc = Afc +1 + y\ 

20 } else { 

21 u = u 

22 C = Ai 

23 } 

24 } while (A fc < C) 

25 min = k 

26 do { 

27 if (k = n) 
return u and exit 

else { 

k = k + l 
u k = u k + stepk 
yt = (E kk — Uk)/ H kk 
step k = -stepk - sign(step k ) 
Afc = A fc+1 + y\ 



Case(B) 



28 
29 
30 
31 
32 
33 
34 
35 



} 



36 } while (X k > C) 

37 max = k 

38 di — max. i — max — 1. 

39 for (i = min — 1, min — 

40 if (di < max) 
di = max 

else 

exit the for loop 



max 
2,... 



-2,.. 
1){ 



Case(C) 



41 
42 
43 

44 } 

45 goto LOOP 

As illustrated, after the initialization part the algorithm is 
divided into three main subsections. We stay in Case(A) if 
we are moving down the layers, and the calculated Euclidean 
distance Afc ( fT8b between the received vector r and the 
projected vector r k -i ( fTTI i. is less than the Euclidean distance 
C between the received vector r and the closest lattice point 
detected so far. On the contrary, if Afc is more than C, we 
move up in the hierarchy of layers. This is done in Case(B). 
Moreover, before quitting Case(A) and (B) each time, we store 
the minimum and maximum level of the layers that we have 
moved downwards and upwards, respectively. We put these 
values in variables min and max, respectively. 

The algorithm to manage the projection and calculation of 
Ej t i values is proposed in Case(C). The d vector is an 1 x 
(n — 1) integer vector which denotes the layer that we should 
start updating the Ejj values for j > i in order to update 
the objective Eij value. For instance, di — k indicates that in 
order to update the E^i value we should start the projection 
from fcth layer, where k > i, and calculate all Ejj (j = 
k — 1, . . . , i) elements of E. 

By taking out lines 4, 25, 37-^-4 and replacing y k with y 
in lines 8-10, 17-19, 32, 34 and also replacing line 15 with 



i = l, 



i k, 



Ek,i — Ek+i,i — yHk+i,i 

the proposed algorithm is changed back to the standard SE 
algorithm in [3]. 

B. Sphere Decoder for Finite Constellations 

Here we present the modification needed to confine the new 
algorithm, proposed in Sec. IIV-AI to finite constellations. The 
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input parameters of the algorithm are received vector r (0 of 
size n, the n x n inverse matrix // (|4ji of the channel matrix G, 
and the minimum (JJ m in) and maximum (U max ) levels of the 
signal constellation, which are constructed from consecutive 
integer numbers in The output is an n-dimensional integer 
vector u = (u%, . . . ,u n ) € U n , which gives a closest 
lattice point x € A(G,U) to the received vector r after a 
simple matrix multiplication x = uG. The changes required 
in the algorithm of Sec. IIV-AI are as follows. 

• Replace lines 7 and 16 with Uk = roundc (Ekk) 

« Replace lines 31-33 with 
Vk = oo 

Uk = u k + step k 

stepk = -stepk - sign(stepk) 

{f (Urnin — Uk — U max ) 

Uk = {Ekk - u k )/H k k 
else { 

u k = u k + stepk 

stepk = -stepk - sign(stepk) 

if {Umin ^ Uk ^ ^moa:) 

Vk — {Ekk — Uk)/H k k 

} 

The roundc operation rounds off everything to the nearest 
integer number inside the constellation boundary U. 

V. Simulation Results 

Herein, we evaluate the effectiveness of the proposed smart 
vector projection technique on the sphere decoder algorithm 
based on SE enumeration strategy for both lattices and finite 
constellations. For the sake of simplicity, we use notation 
Algorithm(I), for our proposed algorithm in Sec. [IV] and Al- 
gorithm(II), for the standard sphere decoder algorithm in [3], 
Both Algorithms are implemented according to pseudocode 
presented in Sec. |IV] The comparison results for lattices with 
and without considering the reduction are presented in Sec. 
IV-AI In Sec. IV-BI the two algorithms are developed and 
evaluated for finite constellations. 

We did not use the execution time as a comparison measure 
to illustrate the complexity of the two algorithms since time 
is dependent to many different factors. Running the two 
algorithms under different compilation methods gives different 
results. Even using the same compiler but different processors 
changes the outcome remarkably. Apart from that, we cannot 
handle the programs that are running in parallel during the 
execution of the algorithm. Hence, to be more precise, we 
base our performance comparison measure on counting the 
number of operations that each algorithm conducts to reach 
the closest lattice point. 

The numerical operations are divided into three different 
groups, floating point operations (Flops), indexing operations 
(Inx-Ops), and integer operations (Int-Ops). We do not use any 
matrix representation in our code, unlike what is presented in 
the pseudocode representation of the algorithm in Sec. [IV] 
since using arrays instead of matrices are less complex and 
time consuming in programming languages. So, the indexing 
operations are based on array based programming not matrix 
based programming. We count any floating point addition, sub- 
traction, multiplication, division, and comparison as floating 



point operations. Similarly, any integer addition, subtraction, 
multiplication, division, and comparison are counted towards 
integer operations. The indexing operations deal with array 
indexing. For instance, f(ki + j) addresses the (ki + j)th 
element of array / in memory unit and ki + j is two indexing 
operations, just like the integer operations. At some points we 
refer to an element of an array, which has an unchanging factor 
several times. Specifically, if we want to refer to f(i + j + k) 
in a for loop and i + j is the unchanging factor, as long as 
the value of i + j is not changed we count i + j as a single 
indexing operation no matter how many times the for loop 
is executed. Similarly, if two or more arrays with the same 
unchanging i + j factor are addressed, just a single indexing 
operation is counted. On the other hand, we do not count the 
for loop counters towards any kinds of numerical operations 
since the concept of the counters are different in programming 
languages and are not considered as numerical operations. The 
round operation is counted as a single floating point operation 
and roundc in Sec. IV-BI is counted as one and two floating 
point operations when 2-PAM and 4-PAM constellations are 
investigated, respectively. 

To compare the complexity of the two algorithms, we 
generate M random generator matrices Gi, . . . ,Gm, and for 
each matrix Gj we generate N random received vectors 
fj i, . . . ,rj t N. The same vectors are decoded using both al- 
gorithms and the operations are counted. 

Depending on various kinds of applications, three different 
types of averaging schemes for comparing the complexity 
between the two algorithms are 



1 ^ Qps;j(f,,„G 3 ) 
1 MN hhi OpsjfauGj)' 



Av 2 = — 



1 ^ YJLiOpsujrj^Gj) 



M U EliOps^G,) 7 
Av _ Ejli Sjli Ops 1 1 (rjA , G 3 ) 
T,j=iT,i=iOpsi(r jih G : j) ' 



(21) 



(22) 



(23) 



where Opsi(rj t i,Gj) and Opsu(rj t i,Gj) denote the total 
number of operations that Algorithm(I) and (II) carry out 
respectively to decode the received vector r^ i in the lattice 
generated with generator matrix Gj. In our case, Opsi and 
Opsji can be any kinds of numerical operations discussed 
above. 



A. Lattices 

We generate the lattice generator matrices with random 
numbers, drawn from i.i.d zero mean, unit variance Gaus- 
sian distributions. The random input vectors are generated 
uniformly inside a Voronoi region according to [27]. 

Our simulation results are based on averaging over M = 50 
different generator matrices. However, the numbers of input 
vectors is related to the dimension of the lattices. The higher 
the dimensions are, the fewer input vectors are examined, to 
the extent that we assure the plotted curves are reasonably 
smooth. 
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TABLE I 

AVERAGE NUMBER OF NUMERICAL OPERATIONS ACCORDING 
TO EQUATION (24) WITHOUT ANY REDUCTION 



Inx-Ops 



Dimension 


Algorithm(I) 


Algorithm(II) 


Flops 


Inx-Ops 


Int-Ops 


Flops 


Inx-Ops 


Int-Ops 


5 


3.2e+2 


2.0e+2 


1.6e+2 


3.4e+2 


2.4e+2 


I.5e+2 


10 


6.0e+3 


3.6e+3 


2.9e+3 


8.1e+3 


6.1e+3 


2.7e+3 


20 


2.1e+5 


1.4e+5 


1.0e+5 


3.9e+5 


3.2e+5 


0.9e+5 


30 


1.5e+7 


1.0e+7 


0.7e+7 


3.7e+7 


3.3e+7 


0.6e+7 


40 


6.0e+8 


3.8e+8 


2.7e+8 


18.2e+8 


16.4e+8 


2.4e+8 


50 


4.1e+10 


2.6e+10 


1.8e+10 


15.3e+10 


14.1e+10 


1.6e+10 


60 


8.4e+ll 


5.4e+ll 


3.8e+ll 


34.9e+ll 


32.2e+ll 


3.3e+ll 



Tab. J] gives the average numbers of numerical operations, 



Av 



MN 



M N 
3=1 i=l 



(24) 



for both algorithms and all three types of operations. It illus- 
trates that most of operations in the two algorithms are based 
on Flops and Inx-Ops, especially as the dimension increases. 
For instance, at dimension 60 in the standard Algorithm(II) the 
Flops and Inx-Ops are roughly 10 times more than the Int-Ops. 
As a result, it can be concluded that Flops and Inx-Ops are 
dominant factors in the complexity of the algorithms. 

The lattices have also passed thorough a preprocessing 
stage and the complexity is compared once the so-called LLL 
reduced basis are obtained. Figs. [3}|5]are plotted for different 
types of numerical operations based on the Av\ averaging 
scheme in (l2"Tl i. without taking into account the operations 
needed for the reduction. Figs. [3]-|4] show that the gain with 
Algorithm(I) increases linearly with the dimension, for both 
Flops and Inx-Ops. The drawback is a somewhat larger num- 
ber of Int-Ops, as shown in Fig. [5] but the penalty converges 
to a mere 15% increase at high dimensions, which according 
to Tab. U has a quite small effect on overall complexity. The 
usage of reduction does not change the ratios substantially, 
although the absolute numbers decrease. 



Flops 



4.5 



3.5 



2.5 



1.5 



— Without Reduction 
■■■ With LLL Reduction 




Fig. 3. Comparison of average floating point operations bet ween Algorithm(I) 
and Algorifhm(II) based on the Avi averaging scheme in )21t . 



o 



— Without Reduction 
■■■ With LLL Reduction 



5 10 15 20 25 30 35 40 45 50 55 60 
Dimension 

Fig. 4. Comparison of average indexing operations between Algorithm(I) 
and Algorithm(II) based on the Avi averaging scheme in )2H . 

Int-Ops 



0.95 



0.9 



0.85 



— Without Reduction 
■■■ With LLL Reduction 



5 10 15 20 25 30 35 40 45 50 55 60 
Dimension 

Fig. 5. Comparison of average integer operations bet ween Algorithm(I) and 
Algorithm(II) based on the Avi averaging scheme in j2U . 



Plotting the figures according to other proposed averaging 
schemes, one would notice that Av2 (l22l > curves are exactly 
the same as the plotted Avi curves, while in case of Av^ 
( l23l ) we might face some minor fluctuations, see Tab. UJ due to 
domination of a few ill-conditioned lattice generator matrices. 

B. Finite Constellations 

In communications theory, one of the most important ap- 
plications where the name of the sphere decoder algorithm 
arises is ML detection for MIMO channels. Assuming perfect 
channel estimation over fading conditions, at each stage we 
have a channel matrix G which changes constantly over the 
time axis, depending on the speed of the channel variations. 
Considering that the momentum of these variations are almost 
constant during the data transmission, the best method to 
compare the performance of the two proposed ML decoder 
algorithms is the second averaging scheme Av2 in (1221) . 
Hence, according to Sec. [TV] the Algorithms(I) and (II) are 
constructed for finite constellations, and the complexity of 
them is compared for 2-PAM and 4-PAM finite constellations 



s 



Flops 



4.5 

4 
3.5 

3 
2.5 

2 
1.5 

1 



— SNR=0dB 
-■■SNR=5dB 
+ SNR=10dB 




30 40 50 
Dimension 



Fig. 6. Comparison of average floating point operations between Algorithm(I) 
and Algorithm(II) for 2-PAM constellation for different SNRs based on the 
Av2 averaging scheme in 1221 . 



Flops 



2.5 



1.5 



— SNR=0dB 
■■■SNR=5dB 
-5fc-SNR=10dB 




20 25 30 
Dimension 

Fig. 7. Comparison of average floating point operations between Algorithm(I) 
and Algorithm(II) for 4-PAM constellation for different SNRs based on the 
Av2 averaging scheme in 1221 . 



without considering any reduction. 

The system model in (0 for an L-PAM constellation is 
considered, where the average symbol energy of the constel- 
lation, E s , is calculated from the signal set { — ^j^-, + 
1, . . . , and the SNR is defined as E b /N , where E b = 

E s /\og 2 L is the average energy per bit and No/ 2 is the 
double-sided noise spectral density. 

The presented results here are just based on floating point 
operations, averaged over 50 different zero mean, unit variance 
Gaussian channel matrices. Fig. [7] illustrates that in 4-PAM 
constellation as the SNR increases, the curves diverges less 
compared to the 2-PAM constellation in Fig. [6] In all cases, 
the Flops increases linearly with the dimensions, and the 
gain is higher at low SNRs. The results presented for higher 
dimensions are applicable to coded transmission [28]. 

VI. Conclusion 

In this paper we have investigated of sate-of-the-art sphere 
decoders and removed most of the numerical operations with- 



out compromising the performance. The algorithm in Sec. 
IIV-BI performs ML detection in finite lattice subsets and is 
hence suitable for MIMO detection, while the variant in Sec. 
IIV-AI performs closest-point search in (infinite) lattices. We 
believe that the new algorithm is the fastest algorithms known 
for both purposes. 
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